A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games

نویسندگان

  • Ronen I. Brafman
  • Moshe Tennenholtz
چکیده

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In ...

متن کامل

Near-Minimum-Time Motion Planning of Manipulators along Specified Path

The large amount of computation necessary for obtaining time optimal solution for moving a manipulator on specified path has made it impossible to introduce an on line time optimal control algorithm. Most of this computational burden is due to calculation of switching points. In this paper a learning algorithm is proposed for finding the switching points. The method, which can be used for both ...

متن کامل

A Near - Optimal Polynomial TimeAlgorithm for Learning in StochasticGames

We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this a...

متن کامل

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible...

متن کامل

Solving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms

This paper presents a mathematical model for designing cellular manufacturing systems (CMSs) solved by genetic algorithms. This model assumes a dynamic production, a stochastic demand, routing flexibility, and machine flexibility. CMS is an application of group technology (GT) for clustering parts and machines by means of their operational and / or apparent form similarity in different aspects ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999